-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add possibility to save trie nodes while re-applying blocks #11811
Conversation
Tested with the following procedure:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11811 +/- ##
==========================================
- Coverage 71.72% 71.68% -0.05%
==========================================
Files 803 804 +1
Lines 163763 163996 +233
Branches 163763 163996 +233
==========================================
+ Hits 117466 117564 +98
- Misses 41231 41362 +131
- Partials 5066 5070 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused :)
You're storing the trie changes (DBCol::TrieChanges
) but as far as I know this column is only used for garbage collection and cold migration. In particular I don't think it's used for answering rpc or viewing state with neard subcommands. I thought you'll be storing the trie nodes directly into the DBCol::State
column which is where I believe the data is missing. That being said it seems like it worked somehow :) Can you have a look into how this works and explain it briefly for me?
Yes, I'm saving the trie changes explicitly although I believe it isn't strictly required. |
Updated PR:
TODO:
|
On the positive side: recovered data is still present after leaving the node running over the weekend. Regenerating trie nodes is idempotent. Not so positive: Overall I haven't found any inherent issue with this PR |
core/store/src/db/recoverydb.rs
Outdated
|
||
/// Returns whether the operation should be kept or dropped. | ||
fn keep_db_op(&self, op: &mut DBOp) -> bool { | ||
let overwrites_same_data = |col: &mut DBCol, key: &mut Vec<u8>, value: &mut Vec<u8>| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this pattern of read-before-write to avoid re-writing the entire DB. It should help in theory, but I don't see any speed up in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, at least it helps preventing the so-common write stalls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may not help now but it would be useful to have later. For example you can use this to write only the missing data to a separate db that can later be shared with node operators for recovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks, this is great effort and a big leap!
I left a few nits but otherwise it's ready to go.
core/store/src/db/recoverydb.rs
Outdated
self.cold.iter_range(col, lower_bound, upper_bound) | ||
} | ||
|
||
/// Atomically applies operations in given transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also mention that it filters / adapts operations?
core/store/src/db/recoverydb.rs
Outdated
|
||
/// Returns whether the operation should be kept or dropped. | ||
fn keep_db_op(&self, op: &mut DBOp) -> bool { | ||
let overwrites_same_data = |col: &mut DBCol, key: &mut Vec<u8>, value: &mut Vec<u8>| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you refactor it to a helper?
core/store/src/db/recoverydb.rs
Outdated
|
||
/// Returns whether the operation should be kept or dropped. | ||
fn keep_db_op(&self, op: &mut DBOp) -> bool { | ||
let overwrites_same_data = |col: &mut DBCol, key: &mut Vec<u8>, value: &mut Vec<u8>| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may not help now but it would be useful to have later. For example you can use this to write only the missing data to a separate db that can later be shared with node operators for recovery.
core/store/src/db/recoverydb.rs
Outdated
if col.is_rc() { | ||
if let Ok(Some(old_value)) = self.get_with_rc_stripped(*col, &key) { | ||
let value = DBSlice::from_vec(value.clone()).strip_refcount(); | ||
if let Some(value) = value { | ||
if value == old_value { | ||
return true; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, since you are only interested in State, you can
- Assert that is_rc() is true and not worry about the other case
- Only check that the key exists, without loading the value into memory to speed up the process. Other than the reference count the value at a key cannot be changed because the key is a hash of the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I've made the method a bit smarter with this new knowledge
core/store/src/db/recoverydb.rs
Outdated
if !matches!(col, DBCol::State) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can move that check to before the match to make it a bit slimmer here. You can use the col
method on DBOp to do that.
core/store/src/db/recoverydb.rs
Outdated
} | ||
|
||
/// Returns whether the operation should be kept or dropped. | ||
fn keep_db_op(&self, op: &mut DBOp) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you need mutable here? It seems like it shouldn't be necessary.
tools/state-viewer/src/cli.rs
Outdated
@@ -213,17 +223,27 @@ pub struct ApplyCmd { | |||
shard_id: ShardId, | |||
#[clap(long, default_value = "trie")] | |||
storage: StorageSource, | |||
/// Save the trie nodes generated by applying the block into the selected store (hot or cold). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also mention that it will only write to the state column?
Also maybe worth renaming to save_state
or similar?
Adds a new argument to
apply
andapply range
commands to save computed trie nodes into hot or cold storage. Normally, this operation should be idempotent, but it can also be used to reconstruct missing trie nodes.